-
Notifications
You must be signed in to change notification settings - Fork 694
store-gateway: receive bucket index metadata from queriers #13942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f7f4e2f to
8c1b022
Compare
Signed-off-by: Vladimir Varankin <[email protected]>
Signed-off-by: Vladimir Varankin <[email protected]>
Signed-off-by: Vladimir Varankin <[email protected]>
Signed-off-by: Vladimir Varankin <[email protected]>
Signed-off-by: Vladimir Varankin <[email protected]>
688ca2c to
db59721
Compare
alexweav
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pkg/storegateway/bucket.go
Outdated
| if diff > 0 { | ||
| group = labelDiscoveryDiffNewer | ||
| } else { | ||
| group = labelDiscoveryDiffOlder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our hypothesis that (typically) there should be zero cases, where queriers discover a new version of bucket-index faster than store-gateways
If this is the case, it might be cheaper to just log at warning level when it's unexpectedly older, instead of introducing new metrics that are always recorded but in theory will be entirely in the Newer category?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now as you mentioned this, I wonder if we really need this metric at all — how about we start with only a warning (and a debug for normal case)? WDYT? Ref 31d7e2d
Signed-off-by: Vladimir Varankin <[email protected]>
Signed-off-by: Vladimir Varankin <[email protected]>
0616302 to
0db1823
Compare
#### What this PR does Following #13942 The PR fixes the bucket index discovery logging. In the previous one, I've missed that the fields added to the `With` are only applied to the fallback logger. So we don't see these fields in practice 🤦 I've noticed this only now because in one of the cells, where the changes were deployed we do see quite a fair bit of these warnings: <img width="1462" height="225" alt="Screenshot 2026-01-15 at 18 05 53" src="https://github.com/user-attachments/assets/f84144d4-0f67-4b06-a6b0-b15c23ce3ea0" /> #### Which issue(s) this PR fixes or relates to Relates to grafana/mimir-squad#3373 #### Checklist - [ ] Tests updated. - [ ] Documentation added. - [ ] `CHANGELOG.md` updated - the order of entries should be `[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry is not needed, please add the `changelog-not-needed` label to the PR. - [ ] [`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md) updated with experimental features. Signed-off-by: Vladimir Varankin <[email protected]>
#### What this PR does Following #13942 The PR fixes the bucket index discovery logging. In the previous one, I've missed that the fields added to the `With` are only applied to the fallback logger. So we don't see these fields in practice 🤦 I've noticed this only now because in one of the cells, where the changes were deployed we do see quite a fair bit of these warnings: <img width="1462" height="225" alt="Screenshot 2026-01-15 at 18 05 53" src="https://github.com/user-attachments/assets/f84144d4-0f67-4b06-a6b0-b15c23ce3ea0" /> #### Which issue(s) this PR fixes or relates to Relates to https://github.com/grafana/mimir-squad/issues/3373 #### Checklist - [ ] Tests updated. - [ ] Documentation added. - [ ] `CHANGELOG.md` updated - the order of entries should be `[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry is not needed, please add the `changelog-not-needed` label to the PR. - [ ] [`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md) updated with experimental features. Signed-off-by: Vladimir Varankin <[email protected]> (cherry picked from commit b268f75)
What this PR does
Following #13875 and #13963
The PR adds a newcortex_bucket_store_bucket_index_discovery_difference_totalmetric, that tracks how often store-gateway receives requests from querier instances, whose bucket-index version is difference from the version, the store-gateway aware of.This is how the debug data looked from a live dev cell (I'm not a huge fan of git-style "ours" vs. "theirs", etc naming; but that's what I came with)
Our hypothesis that (typically) there should be zero cases, where queriers discover a new version of bucket-index faster than store-gateways. To start, the new metric allows us to proof this theory.
I'm thinking that in the future, we may move the passed bucket index from gRPC context to a request's hints to make this (an optional) part of the API contract. The theory is that we can leverage the fact that querier and store-gateway are in agreement about their view a bucket-index (this is still handwave'y for the moment; no concrete details here).
Which issue(s) this PR fixes or relates to
Relates to https://github.com/grafana/mimir-squad/issues/3373